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(54) Method and apparatus for frame accurate access of digital audio-visual information 



(57) A method and apparatus for use in a digital 
video delivery system is provided. A digital representa- 
tion of an audio-visual work, such as an MPEG file 
(104), is parsed to produce a tag file (106). The tag file 
(106) includes information about each of the frames in 
the audio-visual work. During the performance of the 
audio-visual work, data from the digital representation is 
sent from a video pump (130) to a decoder. Seek oper- 
ations are performed by causing the video pump (130) 
to stop transmitting data from the current position in the 
digital representation, and to start transmitting data 
from a new position in the digital representation. The 
information in the tag file (106) is inspected to deter- 
mine the new position from which to start transmitting 
data. To ensure that the data stream transmitted by the 
video pump (130) maintains compliance with the appli- 
cable video format, prefix data that includes appropriate 
information is transmitted by said video pump (130) 
prior to transmitting data from the new position. Fast 
and slow forward and rewind operations are performed 
by selecting video frames based on the information con- 
tained in the tag file (106) and the desired presentation 
rate, and generating a data stream containing data that 
represents the selected video frames. A video editor 
(502) is provided for generating a new video file (510) 
from pre-existing video files (104). The video editor 
(502) selects frames from the pre-existing video files 
(104) based on editing commands and the information 
contained in the tag files (106) of the pre-existing video 
files (104). A presentation rate, start position, end posi- 



tion, and source file may be separately specified for 
each sequence to be created by the video editor (502). 
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Description 

FIELD OF THE INVENTION 

5 [0001] The present invention relates to a method and apparatus for processing audio-visual information, and more 
specifically, to a method and apparatus for providing non-sequential access to audio-visual information stored in a dig- 
ital format. 

BACKGROUND OF THE INVENTION 

10 

[0002] In recent years, the media industry has expanded its horizons beyond traditional analog technologies. Audio, 
photographs, and even feature films are now being recorded or converted into digital formats. To encourage compati- 
bility between products, standard formats have been developed in many of the media categories. 
[0003] MPEG is a popular standard that has been developed for digitally storing audio-visual sequences and for sup- 

is plying the digital data that represents the audio-visual sequences to a client For the purposes of explanation, the 
MPEG-1 and MPEG-2 formats shall be used to explain problems associated with providing non -sequential access to 
audio-visual information. The techniques employed by the present invention to overcome these problems shall also be 
described in the context of MPEG. However, it should be understood that MPEG-1 and MPEG-2 are merely two con- 
texts in which the invention may be applied. The invention is not limited to any particular digital format. 

20 [0004] In the MPEG format, video and audio information are stored in a binary file (an "MPEG file"). The video infor- 
mation within the MPEG file represents a sequence of video frames. This video information may be intermixed with 
audio information that represents one or more soundtracks. The amount of information used to represent a frame of 
video within the MPEG file varies greatly from frame to frame based both on the visual content of the frame and the 
technique used to digitally represent that content. In a typical MPEG file, the amount of digital data used to encode a 

25 single video frame varies from 2K bytes to 50K bytes. 

[0005] During playback, the audio-visual information represented in the MPEG file is sent to a client in a data stream 
(an "MPEG data stream"). An MPEG data stream must comply with certain criteria set forth in the MPEG standards. In 
MPEG-2. the MPEG data stream must consist of fixed-size packets. Specifically, each packet must be exactly 188 
bytes. In MPEG-1, the size of each packet may vary, with a typical size being 2252 bytes. Each packet includes a 

30 header that contains data to describe the contents of the packet. Because the amount of data used to represent each 
frame varies and the size of packets does not vary, there is no correlation between the packet boundaries and the 
boundaries of the video frame information contained therein. 

[0006] MPEG employs three general techniques for encoding frames of video. The three techniques produce three 
types of frame data: Inter-frame ("l-frame") data. Predicted frame ("P-frame") data and Bi-directional ("B-frame") data. 

35 l-frame data contains all of the information required to completely recreate a frame. P-frame data contains information 
that represents the difference between a frame and the frame that corresponds to the previous l-frame data or P-frame 
data. B-frame data contains information that represents relative movement between preceding I or P-frame data and 
succeeding I or P-frame data. These digital frame formats are described in detail in the following international stand- 
ards: ISO/IEC 13818-1, 2, 3 (MPEG-2) and ISO/IEC 11172-1, 2, 3 (MPEG-1). Documents that describe these stand- 

40 ards (hereafter referred to as the "MPEG specifications") are available from ISO/IEC Copyright Office Case Postale 56, 
CH 1 21 1 , Geneve 20. Switzerland. 

[0007] As explained above, video frames cannot be created from P and B-frame data alone. To recreate video frames 
represented in P-frame data, the preceding I or P-frame data is required. Thus, a P-frame can be said to "depend on" 
the preceding I or P-frame. To recreate video frames represented in B-frame data, the preceding I or P-frame data and 
45 the succeeding I or P-frame data are required. Thus, B-frames can be said to depend on the preceding and succeeding 
I or P-frames. 

[0008] The dependencies described above are illustrated in Figure 1 a. The arrows in Figure 1 a indicate an "depends 
on" relationship. Specifically, if a given frame depends on another frame, then an arrow points from the given frame to 
the other frame. 

so [0009] In the illustrated example, frame 20 represents an l-frame. l-frames do not depend on any other frames, there- 
fore no arrows point from frame 20. Frames 26 and 34 represent P-frames. A P-frame depends on the preceding I or P 
frame. Consequently, an arrow 36 points from P-frame 26 to l-frame 20. and an arrow 38 points from P-frame 34 to P- 
frame 26. 

[0010] Frames 22, 24. 28. 30 and 32 represent B-frames. B-frames depend on the preceding and succeeding I or P- 
55 frames. Consequently arrows 40 point from each of frames 22, 24, 28. 30 and 32 to the I or P-frame that precedes each 
of the B-frames, and to each I or P-frame that follows each of the B-frames. 

[001 1] The characteristics of the MPEG format described above allow a large amount of audio-visual information to 
be stored in a relatively small amount of digital storage space. However, these same characteristics make it difficult to 
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play the audio-visual content of an MPEG file in anything but a strict sequential manner. For example, it would be 
extremely difficult to randomly access a video frame because the data for the video frame may start in the middle of one 
MPEG packet and end in the middle of another MPEG packet Further, if the frame is represented by P-frame data, the 
frame cannot be recreated without processing the I and P-frames immediately preceding the P-frame data. If the frame 
5 is represented by B-frame data, the frame cannot be recreated without processing the I and P-frames immediately pre- 
ceding the B-frame data, and the P-frame or l-frame immediately following the B-frame data. 

[0012] As would be expected, the viewers of digital video desire the same functionality from the providers of digital 
video as they now enjoy while watching analog video tapes on video cassette recorders. For example, viewers want to 
be able to make the video jump ahead, jump back, fast forward, fast rewind, slow forward, slow rewind and freeze frame. 
io However, due to the characteristics of the MPEG video format MPEG video providers have only been able to offer par- 
tial implementations of some of these features. 

[001 3] Some MPEG providers have implemented fast forward functionality by generating fast forward MPEG files. A 
fast forward MPEG file is made by recording in MPEG format the fast-forward performance of an analog version of an 
audio-visual sequence. Once a fast forward MPEG file has been created, an MPEG server can simulate fast forward 

is during playback by transmitting an MPEG data stream to a user from data in both the normal-speed MPEG file and the 
fast forward MPEG file. Specifically, the MPEG server switches between reading from the normal MPEG file and read- 
ing from the fast forward MPEG file in response to fast forward and normal play commands generated by the user. This 
same technique can be used to implement fast rewind, forward slow motion and backward slow motion. 
[0014] The separate-MPEG file implementation of fast forward described above has numerous disadvantages. Spe- 

20 cifically, the separate-MPEG file implementation requires the performance of a separate analog-to-MPEG conversion 
for each playback rate that will be supported. This drawback is significant because the analog-to-MPEG conversion 
process is complex and expensive. A second disadvantage is that the use of multiple MPEG files can more than double 
the digital storage space required for a particular audio-visual sequence. A 2x fast forward MPEG file will be approxi- 
mately half the size of the normal speed MPEG file. A half -speed slow motion MPEG file will be approximately twice the 

25 size of the normal speed MPEG file. Since a typical movie takes 2 to 4 gigabytes of disk storage, these costs are sig- 
nificant. 

[001 5] A third disadvantage with the separate-MPEG file approach is that only the playback rates that are specifically 
encoded will be available to the user. The technique does not support rates that are faster than, slower than, or between 
the specifically encoded rates. A fourth disadvantage is that the separate-MPEG file approach requires the existence 

so of a complete analog version of the target audio-visual sequence. Consequently, the technique cannot be applied to live 
feeds, such as live sports events fed through an MPEG encoder and out to users in real-time. 
[0016] Based on the foregoing, it is clearly desirable to provide a method and apparatus for sequentially displaying 
non-sequential frames of a digital video. It is further desirable to provide such non-sequential access in a way that does 
not require the creation and use of multiple digital video files. It is further desirable to provide such access for real-time 

35 feeds as well as stored audio-visual content. 

SUMMARY OF THE INVENTION 

[0017] A method and apparatus for use in a digital video delivery system is provided. A digital representation of an 
40 audio-visual work, such as an MPEG file, is parsed to produce a tag file. The tag file includes information about each 
of the frames in the audio-visual work. Specifically, the tag file contains state information about the state of one or more 
state machines that are used to decode the digital representation. The state information will vary depending on the spe- 
cific technique used to encode the audio-visual work. For MPEG-2 files, for example, the tag file includes information 
about the state of the program elementary stream state machine, the video state machine, and the transport layer state 
45 machine. 

[001 8] During the performance of the audio-visual work, data from the digital representation is sent from a video pump 
to a decoder. According to one embodiment of the invention, the information in the tag file is used to perform seek, fast 
forward, fast rewind, slow forward and slow rewind operations during the performance of the audio-visual work. 
[001 9] Seek operations are performed by causing the video pump to stop transmitting data from the current position 
so in the digital representation, and to start transmitting data from a new position in the digital representation. The infor- 
mation in the tag file is inspected to determine the new position from which to start transmitting data. To ensure that the 
data stream transmitted by the video pump maintains compliance with the applicable video format, prefix data that 
includes appropriate header information is transmitted by said video pump prior to transmitting data from the new posi- 
tion. 

55 [0020] Fast forward, fast rewind, slow forward and slow rewind operations are performed by selecting video frames 
based on the information contained in the tag file and the desired presentation rate, and generating a data stream con- 
taining data that represents the selected video frames. The selection process takes into account a variety of factors, 
including the data transfer rate of the channel on which the data is to be sent, the frame type of the frames, a minimum 
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padding rate, and the possibility of a buffer overflow on the decoder. Prefix and suffix data are inserted into the trans- 
k mitted data stream before and after the data for each frame in order to maintain compliance with the data stream format 
expected by the decoder. 

[0021] A video editor is provided for generating a new video file from pre-existing video files. The video editor selects 
5 frames from the pre-existing video files based on editing commands and the information contained in the tag files of the 
pre-existing video files. A presentation rate, start position, end position, and source file may be separately specified for 
each sequence to be created by the video editor. The video editor adds prefix and suffix data between video data to 
ensure that the new video file conforms to the desired format Significantly, the new video files created by this method 
are created without the need to perform additional analog-to-digital encoding. Further, since analog-to-digital encoding 
10 is not performed, the new file can be created even when one does not have access to the original analog recordings of 
the audio-visual works. 

BRIEF DESCRIPTION OF THE DRAWINGS 

15 [0022] The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accom- 
panying drawings and in which like reference numerals refer to similar elements and in which: 

Figure 1a is a diagram illustrating the dependencies between different types of frames in an MPEG data stream; 
Figure 1b is a block diagram of an audio-visual information delivery system according to an embodiment of the 
20 present invention; 

Figure 2a illustrates the various layers in an MPEG file; 

Figure 2b illustrates the contents of a tag file generated according to an embodiment of the invention; 
Figure 2c illustrates the tag information generated for each frame in an MPEG-1 file. 

Figure 3a illustrates the commands sent from the stream server to the video pump in response to a seek request 
25 according to an embodiment of the invention; 

Figure 3b illustrates the data generated by the video pump to a client in response to the commands illustrated in 
Figure 3a; 

Figure 4a illustrates the commands sent from the stream server to the video pump during a rate-specified playback 
operation according to one embodiment of the invention; 
30 Figure 4b illustrates the data generated by the video pump to a client in response to the commands illustrated in 
Figure 4a; 

Figure 5 illustrates an MPEG editor configured to perform non-interactive MPEG editing according to an embodi- 
ment of the invention; 

Figure 6 is a flow chart illustrating the operation of the MPEG editor of Figure 5 according to an embodiment of the 
35 invention; and 

Figure 7 is a block diagram illustrating a multi-disk MPEG playback system according to an embodiment of the 
invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

40 

[0023] In the following description, the various features of the invention shall discussed under topic headings that 
appear in the following order: 

I. OVERVIEW 
45 II. TAG FILE GENERATION 

III. DIGITAL AUDIOMDEO FILE STRUCTURE 

IV. TAG FILE CONTENTS 
V SEEK OPERATIONS 
VI. PREFIX DATA 

so VII. PACKET DISCONTINUITIES 

VIII. BUFFER LIMITATIONS 

IX. SPECIFIED- RATE PLAYBACK OPERATIONS 

X. BIT BUDGETING 

XI. FRAME TYPE CONSTRAINTS 
55 XII. SUFFIX DATA 

XIII. SLOW MOTION OPERATIONS 

XIV. REWIND OPERATIONS 

XV. RUNTIME COMMUNICATION 
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XVI. FRAME ACCURATE POSITIONING 

XVII. DISK ACCESS CONSTRAINTS 

XVIII. VARIABLE RATE PLAYBACK OPERATIONS 

XIX. NON-INTERACTIVE DIGITAL AUDIO-VIDEO EDITING 
5 XX. DISTRIBUTED SYSTEM 

I. OVERVIEW 

[0024] Figure 1b is a black diagram illustrating an audio-visual information delivery system 100 according to one 
io embodiment of the present invention. Audio-visual information delivery system 100 contains a plurality of clients (1 - n) 
160, 170 and 180. The clients (1 - n) 160, 170 and 180 generally represent devices configured to decode audio-visual 
information contained in a stream of digital audio-visual data. For example, the clients (1 • n) 160, 170, and 180 may be 
set top converter boxes coupled to an output display, such as television. 

[0025] As shown in Figure 1b, the audio-visual information delivery system 100 also includes a stream server 110 
is coupled to a control network 120. Control network 120 may be any network that allows communication between two or 
more devices. For example, control network 120 may be a high bandwidth network, an X.25 circuit or an electronic 
industry association (El A) 232 (RS - 232) serial line. 

[0026] The clients (1- n) 160. 170 and 180, also coupled to the control network 120, communicate with the stream 
server 110 via the control network 120. For example, clients 160, 170 and 180 may transmit requests to initiate the 
20 transmission of audio-visual data streams, transmit control information to affect the playback of ongoing digital audio- 
visual transmissions, or transmit queries for information. Such queries may include, for example, requests for informa- 
tion about which audio-visual data streams are currently available for service. 

[0027] The audio-visual information delivery system 100 further includes a video pump 130, a mass storage device 
140, and a high bandwidth network 150. The video pump 130 is coupled to the stream server 110 and receives com- 

25 mands from the stream server 110. The video pump 1 30 is coupled to the mass storage device 1 40 such that the video 
pump 130 stores and retrieves data from the mass storage device 140. The mass storage device 140 may be any type 
of device or devices used to store large amounts of data. For example, the mass storage device 1 40 may be a magnetic 
storage device or an optical storage device. The mass storage device 1 40 is intended to represent a broad category of 
non-volatile storage devices used to store digital data, which are well known in the art and will not be described further. 

30 While networks 120 and 150 are illustrated as different networks for the purpose of explanation, networks 120 and 150 
may be implemented on a single network. 

[0028] In addition to communicating with the stream server 110, the clients ( 1 - n) 1 60, 1 70 and 1 80 receive informa- 
tion from the video pump 130 through the high bandwidth network 150. The high bandwidth network 150 may be any 
of type of circuit-style network link capable of transferring large amounts of data. A circuit-style network link is config- 

35 ured such that the destination of the data is guaranteed by the underlying network, not by the transmission protocol . For 
example, the high bandwidth network 150 may be an asynchronous transfer mode (ATM) circuit or a physical type of 
line, such as a T1 or E1 line. In addition, the high bandwidth network 150 may utilize a fiber optic cable, twisted pair 
conductors, coaxial cable, or a wireless communication system, such as a microwave communication system. 
[0029] The audio-visual information delivery system 100 of the present invention permits a server, such as the video 

40 pump 1 30, to transfer large amounts of data from the mass storage device 1 40 over the high bandwidth network 1 50 to 
the clients (1 - n) 1 60, 1 70 and 1 80 with minimal overhead. In addition, the audio-visual information delivery system 1 00 
permits the clients (1 - n) 160, 170, and 180 to transmit requests to the stream server 1 10 using a standard network 
protocol via the control network 1 20. In a preferred embodiment, the underlying protocol for the high bandwidth network 
150 and the control network 120 is the same. The stream server 110 may consist of a single computer system, or may 

45 consist of a plurality of computing devices configured as servers. Similarly, the video pump 1 30 may consist of a single 
server device, or may include a plurality of such servers. 

[0030] To receive a digital audio-visual data stream from a particular digital audio-visual file, a client (1 - n) 160, 170 
or 180 transmits a request to the stream server 1 10. In response to the request, the stream server 1 10 transmits com- 
mands to the video pump 1 30 to cause video pump 1 30 to transmit the requested digital audio-visual data stream to the 

so client that requested the digital audio-visual data stream. 

[0031 ] The commands sent to the video pump 1 30 from the stream server 110 include control information specific to 
the client request For example, the control information identifies the desired digital audio-visual file, the beginning offset 
of the desired data within the digital audio-visual file, and the address of the client. In order to create a valid digital 
audio-visual stream at the specified offset the stream server 1 10 also sends "prefix data" to the video pump 130 and 

55 requests the video pump 130 to send the prefix data to the client. As shall be described in greater detail hereafter, prefix 
data is data that prepares the client to receive digital audio-visual data from the specified location in the digital audio- 
visual file. 

[0032] The video pump 1 30, after receiving the commands and control information from the stream server 1 1 0, begins 
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to retrieve digital audio-visual data from the specified location in the specified digital audio-visual file on the mass stor- 
age device 140. For the purpose of explanation, it shall be assumed that system 100 delivers audio-visual information 
in accordance with one or more of the MPEG formats. Consequently, video pump 130 will retrieve the audio-visual data 
from an MPEG file 104 on the mass storage device 140. 

5 [0033] The video pump 1 30 transmits the prefix data to the client, and then seamlessly transmits MPEG data retrieved 
from the mass storage device 140 beginning at the specified location to the client. The prefix data includes a packet 
header which, when followed by the MPEG data located at the specified position, creates an MPEG compliant transition 
packet. The data that follows the first packet is retrieved sequentially from the MPEG file 104, and will therefore consti- 
tute a series of MPEG compliant packets. The video pump 130 transmits these packets to the requesting client via the 

10 high bandwidth network 150. 

[0034] The requesting client receives the MPEG data stream, beginning with the prefix data. The client decodes the 
MPEG data stream to reproduce the audio-visual sequence represented in the MPEG data stream. 

II. TAG FILE GENERATION 

15 

[0035] System 100 includes a tag file generator 1 12. The tag file generator 112 generates a tag file 106 from the 
MPEG file 104. For stored MPEG content, the tag file generation operation is performed by tag file generator 112 "off- 
line" (i.e. prior to any client request for MPEG data from the MPEG file 104). However, in certain situations, such a real- 
time MPEG feeds, tag file generation is performed in real-time during receipt of the MPEG data stream. Consequently, 
20 in the preferred embodiment, tag file generator 1 12 generates tag file 106 in real-time or faster. Tag file generation rates 
may be increased by parallelization of the tag file operation. 

[0036] Tag file generator 112, stream server 1 10 and video pump 130 are illustrated as separate functional units for 
the purpose of explanation. However, the particular division of functionality between units may vary from implementa- 
tion to implementation. The present invention is not limited to any particular division of functionality. For example, tag 

25 file generator 1 1 2 is illustrated as a stand-alone unit. However, in one embodiment, tag file generator 1 1 2 may be incor- 
porated into an MPEG encoder. Such an MPEG encoder would generate the information contained in tag file 106 simul- 
taneous with the generation of the information contained in MPEG file 1 04. An implementation that combines the MPEG 
encoding process with the tag file generation process may increase efficiency by eliminating the need to perform redun- 
dant operations. Such efficiency gains are particularly useful when processing audio-visual feeds in real-time. 

30 [0037] The tag file 106 contains control information that is used by stream server 1 10 to implement fast forward, fast 
rewind, slow forward, slow rewind and seek operations. The use of the tag file 1 06 to perform these operations shall be 
described in greater detail below. The tag file 106 contains general information about the MPEG file 104 and specific 
information about each of the video frames in the MPEG file 104. Prior to discussing in detail the contents of the tag file 
106, the general structure of MPEG file 104 shall be described with reference to Figure 2a. 

35 

III. MPEG FILE STRUCTURE 

[0038] Digital audio-visual storage formats, whether compressed or not, use state machines and packets of various 
structures. The techniques described herein apply to all such storage formats. While the present invention is not limited 
40 to any particular digital audio-visual format, the MPEG-2 transport file structure shall be described for the purposes of 
illustration. 

[0039] Referring to Figure 2a, it illustrates the structure of an MPEG-2 transport file 104 in greater detail. The data 
within MPEG file 104 is packaged into three layers: a program elementary stream ("PES") layer, a transport layer, and 
a video layer. These layers are described in detail in the MPEG-2 specifications. At the PES layer, MPEG file 104 con- 
45 sists of a sequence of PES packets. At the transport layer, the MPEG file 104 consists of a sequence of transport pack- 
ets. At the video layer, MPEG file 104 consists of a sequence of picture packets. Each picture packet contains the data 
for one frame of video. 

[0040] Each PES packet has a header that identifies the length and contents of the PES packet In the illustrated 
example, a PES packet 250 contains a header 248 followed by a sequence of transport packets 251-262. PES packet 
so boundaries coincide with valid transport packet boundaries. Each transport packet contains exclusively one type of 
data. In the illustrated example, transport packets 251 , 256, 258. 259, 260 and 262 contain video data. Transport pack- 
ets 252, 257 and 261 contain audio data. Transport packet 253 contains control data. Transport packet 254 contains 
timing data. Transport packet 255 is a padding packet 

[0041] Each transport packet has a header. The header includes a program ID ("PID") for the packet. Packets 
55 assigned PID 0 are control packets. For example, packet 253 may be assigned PID 0. Other packets, including other 
control packets, are referenced in the PID 0 packets. Specifically, PID 0 control packets include tables that indicate the 
packet types of the packets that immediately follow the PID 0 control packets. For all packets which are not PID 0 control 
packets, the headers contain PIDs which serve as a pointers into the table contained in the PID 0 control packet that 
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most immediately preceded the packets. For example, the type of data contained in a packet with a P ID 100 would be 
determined by inspecting the entry associated with PID 100 in the table of the PID 0 control packet that most recently 
preceded the packet. 

[0042] In the video layer, the MPEG file 104 is divided according to the boundaries of frame data. As mentioned 
5 above, there in no correlation between the boundaries of the data that represent video frames and the transport packet 
boundaries. In the illustrated example, the frame data for one video frame "F" is located as indicated by brackets 270. 
Specifically, the frame data for frame "F" is located from a point 280 within video packet 251 to the end of video packet 
251 , in video packet 256, and from the beginning of video packet 258 to a point 282 within video packet 258. Therefore, 
points 280 and 282 represent the boundaries for the picture packet for frame T". The frame data for a second video 
10 frame "G" is located as indicated by brackets 272. The boundaries for the picture packet for frame "G" are indicated by 
bracket 276. 

[0043] Structures analogous to those described above for MPEG-2 transport streams also exist in other digital audio- 
visual storage formats, including MPEG-1 , Quicklime, AVI, Indeo, Cinepak, Proshare, H.261 and fractal formats. In the 
preferred embodiment, indicators of video access points, time stamps, file locations, etc. are stored such that multiple 
15 digital audio-visual storage formats can be accessed by the same server to simultaneously serve different clients from 
a wide variety of storage formats. Preferably, all of the format specific information and techniques are incorporated in 
the tag generator and the stream server. All of the other elements of the server are format independent. 

IV. TAG FILE CONTENTS 

20 

[0044] The contents of an exemplary tag file 106 shall now be described with reference to Figure 2b. In Figure 2b, the 
tag file 106 includes a file type identifier 202, a length indicator 204, a bit rate indicator 206, a play duration indicator 
208, a frame number indicator 210, stream access information 212 and an initial MPEG time offset 213. File type iden- 
tifier 202 indicates the physical wrapping on the MPEG file 104. For example, file type identifier 202 would indicate 

25 whether MPEG file 104 is a MPEG-2 or an MPEG-1 file. 

[0045] Length indicator 204 indicates the length of the MPEG file 104. Bit rate indicator 206 indicates the bit rate at 
which the contents of the MPEG file 104 should be sent to a client during playback. The play duration indicator 208 
specifies, in milliseconds, the amount of time required to play back the entire contents of MPEG file 104 during a normal 
playback operation. Frame number indicator 210 indicates the total number of frames represented in MPEG file 104. 

30 [0046] Stream access information 212 is information required to access the video and audio streams stored within 
MPEG file 104. Stream access information 212 includes a video elementary stream ID and an audio elementary stream 
ID. For MPEG-2 files, stream access information 212 also includes a video PID and an audio PID. The tag file header 
may also contain other information that may be used to implement features other than those provided by the present 
invention. 

35 [0047] In addition to the general information described above, the tag file 1 06 contains an entry for each frame within 
the MPEG file 104. The entry for a video frame includes information about the state of the various MPEG layers relative 
to the position of the data that represents the frame. For an MPEG-2 file, each entry includes the state of the MPEG-2 
transport state machine, the state of the program elementary stream state machine and the state of the video state 
machine. For an MPEG-1 file, each entry includes the current state of the Pack system MPEG stream and the state of 

40 the video state machine. 

[0048] Tag file entry 214 illustrates in greater detail the tag information that is stored for an individual MPEG-2 video 
frame T". With respect to the state of the program elementary stream state machine, the tag entry 214 includes the 
information indicated in Table 1 . 



TABLE 1 



DATA 


MEANING 


PES OFFSET AT THE START OF PICTURE 217 


The offset, within the PES packet that contains the frame data 
for frame m F n of the first byte of the frame data for frame "P. 


PES OFFSET AT THE END OF PICTURE 219 


The offset between the last byte in the frame data for frame "F" 
and the end of the PES packet in which the frame data for 
frame "F" resides. 



55 

[0049] With respect to the state of the video state machine, tag entry 21 4 includes the information indicated in Table 2. 



7 



BNSDOCID: <EP. 



0964578A2J_> 



EP 0 964 578 A2 



TABLE 2 



DATA 


MEANING 


PICTURE SIZE 220 


The size of the picture packet for frame "P. 


START POSITION 226 


The location within the MPEG file of the first byte of the data that corre- 
sponds to frame "F w 


TIME VALUE 228 


The time, relative to the beginning of the movie, when frame "F" would be 
displayed during a normal playback of MPEG file 104. 


FRAME TYPE 232 


The technique used to encode the frame (e.g. l-frame, P-frame or B- 
frame). 


TIMING BUFFER INFORMATION 238 


Indicates how full the buffer of the decoder is (sent to the decoder to 
determine when information should be moved out of the buffer in order to 
receive newly arriving information). 



[0050] With respect to the state of the transport layer state machine, tag entry 21 4 includes the information indicated 
20 in Table 3. 



TABLE 3 



DATA 


MEANING 


START OFFSET 234 


The distance between the of the first byte in the frame data and the 
start of the transport packet in which the first byte resides. 


# OF NON-VIDEO PACKETS 222 


The number of non-video packets (i.e. audio packets, padding packets, 
control packets and timing packets) that are located within the picture 
packet for frame "P. 


# OF PADDING PACKETS 224 


The number of padding packets that are located within the picture 
packet for frame "P. 


END OFFSET 236 


The distance between the last byte in the frame data and the end of the 
packet in which the last byte resides. 


CURRENT CONTINUITY COUNTER 215 


The Continuity value associated with frame "P. 


DISCONTINUITY FLAG 230 


Indicates whether there is a discontinuity in time between frame "F" 
and the frame represented in the previous tag entry. 



[0051] Assume, for example, that entry 214 is for the frame "P of Figure 2a. The size 220 associated with frame "P 
would be the bits encompassed by bracket 274. The number 222 of non-video packets would be five (packets 252, 253, 
254. 255 and 257. The number 224 of padding packets would be one (packet 255). The start position 226 would be the 
45 distance between the beginning of MPEG file 104 and point 280. The start offset 234 would be the distance between 
the start of packet 251 and point 280. The end offset 236 would be the distance between point 282 and the end of 
packet 258. 

[0052] Figure 2c illustrates the tag information generated for each frame in an MPEG-1 file. Referring to Figure 2c, 
entry 214 includes data indicating the state of three state machines: a system state machine, a pack state machine, and 
so a video state machine. Specifically, tag entry 214 includes the information shown in Table 4. 



TABLE 4 



DATA 


MEANING 


AMOUNT OF NON-VIDEO DATA 221 


The amount of non-video data (in bytes) contained within the start and 
end boundaries of the frame data for frame "P. 
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TABLE 4 (continued) 





DATA 


MEANING 


5 


AMOUNT OF PADDING DATA 223 


The amount of padding data (in bytes) contained within the start and end 
boundaries of the frame data for frame "P. 




PACK OFFSET AT START 225 


The offset between the start boundary of the frame data for frame n F" in 
the beginning of the pack packet that contains the start boundary for 
frame "P. 


10 


PACK REMAINING AT START 227 


The distance between the start boundary for frame "F M and the end of the 
pack packet that contains the start boundary of frame "F". 




PACK OFFSET AT END 229 


The offset between the end boundary for frame "F" in the beginning of the 
packet that contains the end boundary for frame "P. 


15 


PACK REMAINING AT END 231 


The distance between the end boundary for frame "P and the end of the 
pack packet that contains the end boundary of frame "F*. 




PICTURE SIZE 233 


The distance (in bytes) between the start boundary for frame W F" and the 
end boundary for frame "F". 


20 


PICTURE START POS 235 


The distance between the start of the MPEG-1 file and the start boundary 
for frame r . 




PICTURE END POS 237 


The position, relative to the beginning of the MPEG-1 file, of the end 
boundary for frame "F". 


25 


FRAME TYPE 239 


The technique used to encode the data that represents frame W F W . 


TIME VALUE 241 


The time, relative to the beginning of the movie, when frame T" would be 
displayed during a normal playback of MPEG file 104. 


30 


TIMING BUFFER INFO 243 


Indicates how full the decoder is (sent to the decoder to determine when 
information should be moved out of the buffer in order to receive newly 
arriving information). 



[0053] As explained above with reference to MPEG-1 and MPEG-2 formats, the tag information includes data indicat- 
ing the state of the relevant state machines at the beginning of video frames, However, the state machines employed 
as by other digital audio-visual formats differ from those described above just as the state machines employed in the 
MPEG-1 format differ from those employed in MPEG-2. Consequently, the specific tag information stored for each 
frame of video will vary based on the digital audio-video format of the file to which it corresponds. 



V. SEEK OPERATIONS 

40 

[0054] Having explained the contents of tag file 106. the use of tag file 106 to perform seek operations shall now be 
described. When a client wishes to perform a seek operation, the client transmits a seek operation request to stream 
server 110. The seek operation request may specify, for example, to jump ahead in the MPEG sequence to a position 
five minutes ahead of the current playing position. In response to the request, stream server 110 inspects the tag file 
45 106 to determine the l-frame (the "target frame") that would be playing in five minutes if the playback operation pro- 
ceeded at a normal rate. The target frame may be easily determined by inspecting the time value 228 and frame type 
232 information stored in tag file 106. 

[0055] When the target frame is determined, stream server 1 10 determines the position within the MPEG file 104 of 
the frame data that corresponds to the target frame (the "target position"). Stream server 1 1 0 performs this determina- 
so tion by reading the start position 226 stored in the entry in tag file 106 that corresponds to the target position. Signifi- 
cantly, all of the operations performed by stream server 110 are performed without the need to access MPEG file 104. 
This allows for the stream server 110 and the video pump 1 30 to be distributed among the various servers in the server 
complex. 

[0056] For the purpose of explanation, various components of system 1 00 are said to read data from a particular stor- 
55 age medium. For example, tag file generator 112 and video pump 130 are described as reading data from MPEG file 
104 located on mass storage device 140, and stream server 1 10 is described as reading data from tag file 106 stored 
on mass storage device 140. However, when data is to be frequently accessed, it is typically cached in a faster, tempo- 
rary storage medium such as dynamic memory. Rather than read the data directly from the slower storage, the compo- 
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nents read the data from the faster temporary storage. In the preferred embodiment at least a portion of the tag file 106 
is stored in a cache memory to reduce the number of disk accesses performed by stream server 110. 
{0057] Once the target position has been determined, the stream server 1 10 constructs prefix data for the transition. 
As mentioned above, prefix data is data that must be inserted into the MPEG data stream prior to a transition to ensure 

5 that the MPEG data stream remains MPEG compliant. Prefix data shall be described in greater detail below. 

[0058] Once stream server 1 1 0 constructs the prefix data, stream server 1 1 0 transmits commands to video pump 1 30 
to instruct video pump 1 30 to transition from the current position in the MPEG file to the target position. For a seek oper- 
ation, the commands generated by stream server 110 will typically include an insert command and a play command. 
The insert command instructs the video pump 130 to cease transmission of MPEG data from the current position, and 

io to transmit the prefix data. This process effectively "inserts" the prefix data into the MPEG data stream. The play com- 
mand instructs the video pump 130 to begin transmitting data starting at the target position within the MPEG file 104. 
The video pump 130 inserts this data in a byte-contiguous way such that the client does not see any boundary between 
the prefix data, the MPEG data, and the suffix data. 

[0059] Referring to Figure 3a, it illustrates the commands sent by the stream server 1 10 to the video pump 130 in 
15 response to a seek request from a client. In the illustrated example, the stream server 110 transmits two commands 
302 to the video pump 130. The first command is an insert command instructing video pump 130 to insert 
"PREFIX_DATA" into the MPEG data stream that the video pump 130 is sending to a client. 

[0060] The second command is a play command. The play command instructs the video pump 130 to transmit data 
beginning at the position "START_POS". START_POS is the position within MPEG file 104 of the first byte of the target 
20 frame. 

[0061] In the preferred embodiment, the "play" instruction supports a "begin position" parameter and an "end position" 
parameter. In response to a play instruction, the video pump 130 transmits data from the MPEG file beginning at the 
begin position, and continues to transmit data from the MPEG file until the specified end position is reached. In a seek 
operation, it is assumed that the playback will continue from the target position to the end of the MPEG file. Therefore, 

25 only the begin position parameter of the play command is required for seek operations. 

[0062] Referring to Figure 3b. it illustrates the information sent from video pump 130 to a client (e.g. client 160) in 
response to the "insert" and "play" commands transmitted by stream server 1 10. At the time that the video pump 130 
receives the insert command, the video pump 1 30 will be sending MPEG data from some position in the MPEG file (the 
"current position"). Block 320 represents information transmitted by video pump 130 up to the current position. Upon 

30 receiving the insert command, the video pump 130 finishes sending the current transport packet ceases to transmit 
data from the current position and transmits the prefix data 322. After transmitting the prefix data 322 to the client the 
video pump 130 responds to the play command. Specifically, the video pump 130 begins transmission to the client of 
data 324 beginning at the target location in the MPEG file. 

[0063] There is no interruption in the MPEG data stream transmitted by video pump 1 30 to the client during this proc- 
35 ess. In addition, the MPEG data stream received by the client fully complies to the MPEG standard. Consequently, the 
MPEG decoder within the client remains completely unaware that a seek operation was performed. Because seek oper- 
ations performed by the technique discussed above produce an MPEG compliant data stream, custom MPEG decoders 
are not required. 



40 VI. PREFIX DATA 

[0064] As mentioned above. MPEG data is packaged in layers. Clients expect the data stream that they receive from 
video pump 130 to be packaged in those same layers. If video pump 130 simply jumps from one point in the MPEG file 
104 to another point, packaging information will be lost and the clients will not be able to properly decode the data. For 
45 example, if video pump 130 simply starts transmitting data from point 280 in Figure 2a, the PES header 248 for PES 
packet 250 and the header for transport packet 251 will be skipped. These headers contain data which indicates how 
to decode the information which follows them. Consequently, without the information contained in these headers, the 
client will not know how to decode the subsequent data. 

[0065] Therefore, prefix data must be constructed and sent to smoothly transition between the current location in the 
so MPEG file 104 and a new location. The prefix data contains packaging information which begins packages for the data 
at the new location. In the preferred embodiment the prefix data includes the information described in Table 5. 



55 
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TABLE 5 





DATA 


MEANING 


£ 


DISCARD INFORMATION 


For MPEG 2: This is a list of PIDs to keep. All other trans- 
port packets are discarded. 

For MPEG 1 : This is a list of elementary streams to keep. 


10 


SYSTEM & PACK HEADER DATA (MPEG-1 ONLY) 


Includes a valid system header and a valid Pack Header. 


TRANSPORT PACKET HEADER DATA (MPEG-2 ONLY) 


Includes private data and MPEG video header data, 
described below. 


15 


PRIVATE DATA 


Includes a private time stamp and other data described 
below. 


VIDEO INITIALIZATION DATA 


Includes an MPEG sequence header which indicates 
frames per second and horizontal and vertical resolu- 
tions. 


20 


POSSIBLE EXTRA PADDING AND SECOND TRANS- 
PORT PACKET HEADER (MPEG-2 ONLY) 


Explained below. 




MPEG VIDEO HEADER 


MPEG-2: Includes a valid PES header, a video presenta- 
tion time and, under certain conditions, discontinuity data 
which causes the client's clock to be reset. 


25 




MPEG-1: Contains a valid picture header. 



[0066] With respect to the discard information, assume that the target video frame of a seek operation is the video 
frame located between points 280 and 282 in Figure 2a. The discard information contained in the insert command gen- 

30 erated in response to the seek operation may instruct video pump 130 to discard all of the non-video packets located 
between points 280 and 282. According to one embodiment, the packets are identified by their PD numbers. 
[0067] With respect to private data, the mechanism used to convey this data differs between MPEG-1 and MPEG-2. 
For MPEG-1, private data is sent in a pack packet on the ISO/IEC private data-1 stream. (See section 2.4.4.2 of ISO 
1 1 1 72-1 for more information). For MPEG-2, private data is sent in a packet on the video PID, but in a section of the 

35 adaptation field titled private data. (See section 2.4.3.4 of ISO/IEC 13818-1 for more information). 

[0068] Since may clients may desire specific information about the operation in progress (seek, fast forward, rewind, 
frame advance or rewind) which cannot be encoded in the file's digital audio-visual storage format, private data is used. 
When the server knows that "client specific 0 information is needed, it places it into whatever private data mechanism is 
supported by the file's audio-visual storage format. Thus, the output to the network maintains its conformance to the 

40 required format. This is necessary in case the network is performing checks to be sure that data is not corrupted in 
transmission. By virtue of being in private data, the "client specific" data will not be checked. 

[0069] With respect to the possible extra padding, since transport packets have a fixed size in MPEG-2, an extra pad- 
ding packet is required when the prefix data is too large to fit into the same packet as the first block of video data. For 
example, assume that point 280 is ten bytes from the beginning of video packet 251 . If the prefix data required to tran- 
45 sition to point 280 is greater than ten bytes, then the prefix data will not fit in the same packet as the first block of video 
data. Under such circumstances, the prefix data is sent in a transport packet that is completed with padding. A second 
transport packet is constructed to transmit the video data located between point 280 and the end of video packet 251 . 
The first ten bytes in this second transport packet are filled with padding. 

[0070] Since MPEG-1 has variable size packets, this issue for MPEG-1 does not arise. Rather, a correct packet size 
so for the prefix data is simply computed. 

VIL PACKET DISCONTINUITIES 

[0071] In the original MPEG file 104, each packet has an associated time stamp. Typically, the time stamps of packets 
55 sequentially located within MPEG file 104 will be sequential. During playback operations, the dierrt tracks the time 
stamps to determine the integrity of the MPEG data stream. If two sequentially-received packets do not have sequential 
time stamps, then the client determines that a discontinuity has occurred. If the difference between two sequentially- 
received time stamps is small, then the client can usually compensate for the discontinuity. However, if the difference 
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between two sequentially-received time stamps is too great, the client may reset itself or initiate some other type of 
recovery operation. 

[0072] When a seek operation is performed, the client will sequentially receive packets that are not sequentially 
located within the MPEG file 104. Because the packets are not sequentially located within MPEG file 104, the time 
5 stamps associated with the packets will not be sequential. If the jump specified by the seek operation is relatively large, 
then the discontinuity between the time stamps may be sufficient to cause the client to terminate normal playback. To 
avoid this situation, data which causes the client to reset its clock is included in the prefix data. Upon receipt of such 
data, the client simply resets its clock based on the time stamp contained in the following packet. 
[0073] As noted above, the time stamps of packets sequentially located within an MPEG file will typically be sequen- 
ce tial. However, it is possible to have sequentially stored packets that do not have sequential time stamps. If a large dis- 
continuity occurs between packets in the original MPEG file, then the original MPEG file will itself contain data which 
causes the clients clock to reset. Stream server 1 10 inspects the discontinuity flags 230 in tag file 106 to determine 
whether a particular seek operation will skip any packets which contain data to reset the client's clock. If the seek oper- 
ation skips over any discontinuous packets, then data that causes the client's clock to reset is added to the prefix data. 
is [0074] TTiough in concept the same operation is performed in MPEG-1 and MPEG-2, the mechanism by which the 
operation is performed differs because of the different timing mechanisms used in MPEG-1 and 2. Specifically, in the 
MPEG-1 embodiment, the "System Clock Reference" (SCR) is the clock used (see Section 2.4.2 of ISO/IEC 1 1 1 72-1). 
[0075] In the MPEG-2 embodiment, the "Program Clock Reference" (PCR) and "Presentation Time Stamp" (PTS) are 
both used. See sections 2.4.2.1 and 2.4.3.6 of ISO/IEC 13818-1 respectively for definitions of the PCR and PTS. 

20 

VIII. BUFFER LIMITATIONS 

[0076] The MPEG decoder in each client has a buffer of a certain limited size. Typically the buffer must be large 
enough to hold information from two sequential frames of video. Consequently, the data for the later frame of video may 
25 be written into the buffer at the same time that the data for the previous frame of video is being read out of the buffer by 
the decoder. 

[0077] In many clients, the size of the buffer is selected based on the assumption that the incoming MPEG data 
stream will never contain two sequentially-ordered large l-f rames of video data. During normal playback from an MPEG- 
compliant file, this assumption will hold true, since P and B-frames will occur between successive l-frames. However, 

30 seek operations may cause a jump from a large l-frame located at a first location in the MPEG file 104 to a second I- 
f rame located at a second location in the MPEG file 1 04. If an attempt is made to write the second l-frame into the buffer 
before the first l-frame has been entirely read from the buffer, the decoder may lose synchronization or otherwise fail: 
Stream server 1 10 detects when a seek operation would cause such an overflow by inspecting the timing buffer infor- 
mation 238 stored in the tag file 1 06. 

35 [0078] To avoid such buffer overflow, the stream server 1 1 0 inserts data into the prefix data that will cause the arrival 
of the second large l-frame to the decoder buffer to be delayed. While the second l-frame is delayed, the client has time 
to complete the processing of the first l-frame. By the time the data for the second l-frame begins to arrive, the first I- 
frame has been completely processed so that the portion of the buffer used to hold the previous l-frame is available to 
hold the second l-frame. 

40 [0079] According to one embodiment, the second l-frame is delayed by placing a delayed time stamp in transport 
packet header portion of the prefix data. The transport packet header portion of the prefix data serves as the header for 
the packet that contains the beginning of the second l-frame (the "transition packet"). The transition packet is received 
by a network buffer that feeds the decoder buffer. The network buffer determines when to send the video information 
contained in the transition packet to the decoder buffer based on the time stamp in the transition packet. Because the 

45 time stamp indicates a delay between the transition packet and the previous packet, the network buffer delays the trans- 
fer of the video information from the transition packet into the decoder buffer. 

[0080] According to an alternate embodiment the second l-frame is delayed by adding padding packets to the prefix 
data prior to the data that serves as the heading for the transition packet. Such padding packets will arrive at the client 
prior to the transition packet As the client receives and discards the padding packets, the first l-frame is being read from 
so the decoder buffer. By the time all of the padding packets have been processed, the first l-frame has been completely 
read out of the decoder buffer and the decoder buffer is ready to receive the second l-frame. 

IX. SPECIFIED-RATE PLAYBACK OPERATIONS 

55 [0081] Most video cassette recorders allow viewers to watch analog-based audio-visual works at playback speeds 
other than normal 1x forward playback. For example, some video cassette recorders provide multiple rates of fast for- 
ward, slow forward, slow rewind and fast rewind. The present invention provides similar functionality to the viewers of 
MPEG-encoded works. In the preferred embodiment, the functionality of typical video cassette recorders is surpassed 
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in that any speed of forward and rewind playback is supported. For example, a viewer could select 1000x fast forward 
or fast rewind, or .0001 slow forward or slow rewind. 

[0082] In the preferred embodiment, the processes used to implement fast forward, slow forward, slow rewind and 
fast rewind operations include the same general steps. Therefore, for the purpose of explanation, these steps shall be 
5 described with reference to a fast forward operation. After the fast forward process is explained, it shall be described 
how and when slow motion and rewind operations differ from fast forward operations. 

[0083] To initiate a fast forward operation, a client transmits a fast forward request to the stream server 11 0. in embod- 
iments that support more than one fast forward rate, the fast forward request includes data designating a presentation 
rate. As used herein, "presentation rate" refers to the rate at which the audio-visual work is presented to a viewer. 
10 [0084] The stream server 1 10 receives the fast forward request from the client and, in response to the request, 
inspects the information contained in tag file 106. Specifically, stream server 110 determines from the information in tag 
file 106 which frames should be displayed to produce the specified presentation rate. The frame selection process per- 
formed by stream server 1 10 must take into account various constraints that will be described in greater detail below. 

is X. BIT BUDGETING 

[0085] The simplest method for selecting frames during a fast forward operation would be to select every Nth frame, 
where N is the specified presentation rate relative to normal presentation rate. For example, assume that the client 
requests a 5x fast forward operation. In response to such a request, stream server 1 10 could select every fifth frame for 

20 display. Stream server 110 would then transmit a series of play commands to video pump 1 30 to cause video pump 1 30 
to transmit an MPEG data stream that contains data for every fifth frame. Thus, the presentation rate would be 5x. 
[0086] The simple frame selection process described above could work if all of the frames in the MPEG file 104 were 
encoded in l-frame format and if either all l-frames were the same size or the bandwidth of network 150 was unlimited. 
However, the bandwidth of network 1 50 is not unlimited, l-frames do not all have the same size and, as explained above, 

25 MPEG files also include frames encoded in P-frame and B-frame formats which cannot be decoded independent of 
information from other frames. 

[0087] The bandwidth between video pump 130 and its clients is limited. For example, video pump 130 may be allo- 
cated a 1 .5 or 2 Megabits per second channel for each MPEG data stream it transmits to a client. To determine whether 
selection of a particular frame (the Irame at issue") will exceed the available bandwidth, stream server 110 determines 

30 the size of the time window that will be available to send the particular frame. The size of the time window is equal to 
(T2-T1)/PR, where T1 is the time value associated with the previously selected frame, T2 is the time value associated 
with the frame at issue, and PR is the current presentation rate. For example, assume that the time associated with pre- 
viously selected frame is one second away from the time of the frame at issue. Assume also that the presentation rate 
is 10x. Therefore, the time window for sending the frame at issue would be (1 second)/10 or .1 seconds. 

35 [0088] Once the stream server 1 1 0 determines the time window available to send the data for the frame at issue, the 
stream server 110 determines the current N bit budget" by multiplying the time window by the data transfer rate of the 
channel through which the MPEG data stream is being sent to the client. For example, if the applicable data transfer 
rate is 2 Megabits per second and the time window is .1 seconds, then the current bit budget is 200K bits. The stream 
server 1 10 then reads the frame size from the tag information to determine if the frame at issue falls within the current 

40 bit budget. If the size of the frame at issue exceeds the current bit budget then the frame at issue is not selected. This 
is the case, for example, if the size of the frame data for the frame at issue is 50K bytes (400K bits) and the bit budget 
is 200K bits. Otherwise, if the frame at issue falls within the bit budget, then the frame at issue is selected to be sent. If 
a particular frame is not sent, then it is more likely that a future frame will be sent, because of the unused timespace 
(and thus bits in the bit budget) of the unused frames. 

45 

XI. FRAME-TYPE CONSTRAINTS 

[0089] As explained above, a frame cannot be accurately recreated from P-frame data unless the preceding l-frame 
has been decoded. A frame cannot be accurate recreated from B-frame data unless the preceding and succeeding P 

so or l-frame data is decoded. Consequently, stream server 1 1 0 is limited with respect to which frames it can select. 

[0090] Assuming that the bandwidth is available, any l-frame can be selected. According to one embodiment of the 
invention, only l-frames are even considered for selection. Steam server 110 accesses the tag information to determine 
the frame type of the frame at issue. If the frame at issue is not an l-frame. then it is automatically skipped, and stream 
server 110 moves on to evaluate the subsequent frame. At some playback rates, this technique may result in unused 

55 bandwidth. That is, the transmission of every l-frame will require less bandwidth than is available. Therefore, stream 
server 1 1 0 transmits insert commands to cause video pump 1 30 to transmit MPEG padding between the transmission 
of l-frame information. In the preferred embodiment, the padding packets are sent as one component of suffix data, 
which shall be described in greater detail below. 
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[0091] According to the preferred embodiment P and B-frames are not automatically skipped in the frame selection 
process. Rather, P and B-frames are considered for selection unless information that they require has already been 
skipped. Specifically, if any l-frame is not selected by stream server 1 10, then the frames that fall between the skipped 
l-frame and the subsequent l-frame are skipped. In addition, if any P-frame is not selected, then the B and P-frames 
5 that fail between the skipped P-frame and the subsequent l-frame are skipped. Based on these rules, any additional 
bandwidth available between the transmission of l-frames may be filled with P-frame and B-frame data. As a result, the 
resulting MPEG data stream will have more frames per second. 

(0092] According to yet another embodiment, stream server 1 1 0 is programmed to skip some l-frames even when the 
bandwidth is available to send them. For example, stream server 110 may skip every fifth l-frame that otherwise quali- 
10 ties for selection. Because l-frames are significantly larger than P and B-frames, numerous P and B frames may be sent 
in the bandwidth made available by skipping a single l-frame. Consequently, the resulting MPEG data stream has more 
frames per second than it would otherwise have if all qualifying l-frames were selected. 

[0093] In the preferred embodiment, a client may specify parameters for the selection process performed by stream 
server 110. For example, the client may request more frames per second. In response, the stream server 110 transmits 
15 more P and B frames in the MPEG data stream by increasing the number of qualifying l-frames that it skips. On the 
other hand, the client may request a more continuous picture. In response, the stream server 110 transmits a higher 
percentage of qualifying l-frames, leaving less bandwidth tor transmitting P and B-frames. 

XII. SUFFIX DATA 

20 

[0094] While the stream server 1 1 0 is selecting the frames to be displayed during a fast forward operation, the stream 
server 1 1 0 is simultaneously transmitting commands to the video pump 1 30 to cause the video pump 1 30 to send an 
MPEG video stream containing the frames that have already been selected. The portion of the MPEG data stream used 
to convey data for a selected frame is referred to herein as a "segment". To maintain compliance with the MPEG stand- 
25 ards, segments include prefix data that is sent prior to transmitting the frame data for the selected video frames. The 
process of generating prefix data was described above with reference to seek operations. 

[0095] Performing a fast forward operation is similar to performing a series of seek operations in which each seek 
operation causes the video pump 130 to jump to the data for the next selected frame. Specifically, for each selected 
frame, the stream server 110 must generate prefix data, transmit an insert command to the video pump 130 to cause 
30 the video pump 1 30 to insert the prefix data into the data stream, and transmit a play command to the video pump 130 
to cause the video pump 1 30 to transmit data from the appropriate frame. 

[0096] Fast forward operations differ from seek operations in that the play command specifies an end position as well 
as a beginning position. The end position is the location within the MPEG file 104 of the last byte of the frame data for 
the selected frame. For example, assume that the frame boundaries for a selected frame F are points 280 and 282 illus- 
35 trated in Figure 2a. The stream server 1 10 would send video pump 130 an insert command to cause video pump 130 
to send prefix data to the client and a play command to cause video pump 1 30 to send the video data located between 
points 280 and 282 to the client 

[0097] Typically, the end position (e.g. point 282) specified in the play command will not coincide with a packet bound- 
ary. Therefore, to maintain MPEG compliance, additional information ("suffix data") must be inserted into the data 

40 stream after the transmission of the frame data. The suffix data includes padding which completes the transport packet 
that contains the end of the selected frame. For example, the suffix data that would be inserted into the data stream 
after sending the frame F would contain a length of padding equal to the distance between point 282 and the end of 
video packet 258. Under certain conditions, the suffix data also includes padding packets. As shall be described here- 
after, the number of padding packets sent in the suffix data depends on the size of the frame data, the presentation rate, 

45 the minimum padding rate and the number of padding packets that were left inside the frame data. Thus, a segment 
consists of prefix data, the frame data of a selected frame, and suffix data. 

[0098] The stream server 110 generates the suffix data and transmits an insert command to the video pump 130 to 
cause the video pump to insert the suffix data into the MPEG data stream. Consequently, during a fast forward opera- 
tion, the commands sent by the 6tream server 1 10 to the video pump 130 appear as illustrated in Figure 4a. Referring 

so to Figure 4a, stream server 110 has thus far selected three frames to be displayed: frame_J , frame_2 and frame_3. 
Upon selecting frame__1, stream server 110 transmits three commands 402 to the video pump 130. The three com- 
mands 402 include a first insert command 408, a play command 410 and a second insert command 412. 
[0099] The first insert command 408 instructs video pump 130 to transmit prefix data "PREFIX_DATA_1 " to a client. 
The play command 410 instructs video pump 130 to transmit the data located between the positions START_POS_1 

55 and END_POS_1 to the client. In the illustrated example. START_POS_1 would be the position of the first byte of 
frame_1 , and END-POS_1 would be the position of the last byte of f rame_1 . The second insert command 412 instructs 
the video pump 130 to transmit suffix data w SUFFIX_DATA_r to the client. The data that is specified by these three 
commands constitutes a segment for frame_1 . 
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[0100] As explained above, many transport packets may be required to store the frame data for a single video frame 
(e.g. frame_1). Other packets that do not contain video information, such as padding packets, timing packets and audio 
packets, may be interspersed between the video packets for the video frame. In the preferred embodiment, stream 
server 110 not only transmits the boundaries of each frame to video pump 130, but stream server 110 also indicates 
5 what to do with the non-video packets within those boundaries. Typically, the audio packets will be discarded. However, 
the other non-video packets may or may not be retained based on various factors. For example, to sustain the minimum 
padding rate stream server 110 may indicate that the padding packets are to be maintained. The value of maintaining 
a minimum padding rate shall be discussed in greater detail below. 

[01 01 ] Video pump 1 30 receives this information from stream server 1 1 0 and strips from the MPEG data stream those 
10 non-video packets indicated by the stream server 110. Consequently, the information sent by video pump 130 in 
response to play command 410 will typically include less than all of the data located between START_POS_1 and 
START_POS__2. 

[0102] Referring again to Figure 4a, stream server 110 has transmitted three commands 404 to cause video pump 
130 to transmit a segment for frame_2, and three commands 406 to cause video pump 130 to transmit a segment for 
is frame_3. Stream server 110 will continue to transmit commands in this manner to cause video pump 130 to transmit 
segments for every frame that it selects to be displayed during the fast forward operation. 

[0103] Referring to Figure 4b, it illustrates the data transmitted by video pump 130 in response to the commands 
described above. Specifically, in response to the first insert command 408, video pump 130 transmits PREFIX__DATA_1 
450 to the client 160. In response to play command 410, video pump 130 transmits the data located between 
20 START_POS_1 and END_POS_1. This data, illustrated as DATA_1 452, contains the frame data of frame_1. In 
response to the second insert command 412, video pump 130 transmits SUFFIX_DATA_1 to the client 160. The seg- 
ment consisting of P RE F IX_JDATA_1 , DATA__1 and SUFFIX_DATA_1 conveys the frame data of frame_1 to client 160 
while maintaining compliance with the MPEG standards. 

[0104] In the preferred embodiment; these commands between the stream server 110 and video pump 130 are sent 
25 over a very fast lightweight network or through shared memory. For a typical stream, supporting 1 5 f rames-per second 
of fast forward. 45 commands per second shall be sent thus stressing communications inside the server. In the pre- 
ferred embodiment, the commands are sent from the stream server 1 10 to the video pump 130 in batches. 

XIII. SLOW MOTION OPERATIONS 

30 

[0105] As explained above, frames are selectively skipped for playback operations that exceed normal playback 
speed. For playback operations that are slower than normal playback speed, no frames are skipped. Rather, stream 
server 1 1 0 selects every frame. As in fast forward operations, the video pump 1 30 transmits segments for each of the 
selected frames in response to commands generated by stream server 110. The suffix data in the segments include 
35 padding packets which delay the arrival of the subsequent segments. Consequently, the frame data arrives and is 
decoded at a slower rate than during normal playback operations. Alternatively, the time delays may be imposed by 
causing the stream server 1 10 to insert delayed time stamps into the prefix data that it sends to the video pump 130. 

XIV. REWIND OPERATIONS 

40 

[0106] Rewind operations are performed in the same manner as fast and slow forward operations with the exception 
that only l-frames are selected for rewind operations (regardless of whether the rewind operations are fast or slow). P 
and B frames are automatically skipped because they cannot be decoded unless frames that precede them in the orig- 
inal MPEG file are processed before them. However, during rewind operations, the frames on which P and B frames 

45 depend will be processed after the P and B frames that depend on them. 

[0107] The concept of "muftistream" fast forward or rewind has been mentioned above. Multistream fast forward or 
rewind is accomplished by storing multiple copies of the movie, where the copies have been recorded at various rates. 
[0108] In the preferred embodiment, when a client requests a certain fast forward or rewind presentation rate, the 
stream server 1 1 0 will determine whether it has a prerecorded file at that rate. If so, it will play that file. This will give the 

so user more frames per second and will also cause less computational and communication load on the stream server 110 
and video pump 130. However, H the requested rate is not available, the stream server 110 will determine the best file 
from which to choose individual frames, and will process that file as described above. The best file will be the file which 
has the most l-frames to select from at the requested presentation rate. 

[0109] This integration of "multi-stream" and "single-stream" fast forward and rewind thus allows servers to choose 
55 between any level of quality, disk storage requirements, and server computational and communication load, providing 
significant advantage over the use of multi-stream operations alone. 
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XV. RUNTIME COMMUNICATION 

[0110] In the preferred embodiment, stream server 1 10 is configured to receive and transmit responses to queries 
made by clients while video pump 130 is transmitting an MPEG data stream to the clients. The stream server 110 con- 

5 veys the responses to the queries to the client by causing video pump 1 30 to insert the responses into the MPEG data 
stream that is being sent to the client. This process is complicated by the fact that the communication channel between 
video pump 130 and each client is completely filled by the MPEG data stream tat the video pump 130 is sending. 
[011 1] However, some packets in the MPEG data stream are merely padding, and do not contribute to the resulting 
audio-visual display. To take advantage of the bandwidth occupied by these padding packets, the stream server 1 10 

ro causes video pump 130 to replace these padding packets with data packets that contain responses to the queries. 
When the data packets arrive at the client, the MPEG decoder in the client determines that the data packets do not con- 
tain audio-visual data and passes the data packets to a higher level application. The higher level application inspects 
the data packets and extracts from the data packets any information contained therein. 

[0112] During fast forward and fast rewind operations, the ability of the stream server 1 10 to communicate with the 
is client in this manner would be lost if the frame selection process did not leave room for padding packets that may be 
replaced with data packets. Therefore, in one embodiment of the invention, the stream server 1 10 selects frames in 
such a way as to ensure some available minimum padding rate. If selection of a frame would cause the padding rate to 
fall below the specified minimum rate, then the frame is skipped. The stream server 110 also tells the video pump 130 
where to put the requisite padding. 
20 [01 1 3] According to one embodiment the video pump 1 30 does not replace padding packets with data packets, but 
actually generates the padding packets. The MPEG data stream transmitted by the video pump 130 passes through a 
downstream manager 131 prior to arriving at the client The downstream manager replaces the padding packets with 
data packets that contain the responses generated by stream server 110. Because the MPEG data stream maintains a 
minimum level of padding, the downstream manager is guaranteed a minimum bandwidth for placing data packets into 
25 the MPEG data stream. 

XVI. FRAME ACCURATE POSITIONING 

[0114] For many uses, it is important to be able to determine exactly which frame is being displayed by the client at 
30 any given time. For example, a user may wish to pause the playback of an MPEG movie, select an item on the screen, 
and select a menu option that places an order for the item over the network. If the currently displayed frame is not accu- 
rately identified, then the wrong item may be ordered. 

[0115] During normal movie play, frame accurate positioning is encoded as part of the normal MPEG data stream. 
Specifically, time stamps are interleaved with the frame data in the MPEG data stream. Hardware in the client extracts 
35 this timing information. Typically, numerous frames follow each time stamp. Therefore, the client uniquely identifies the 
currently displayed frame based on the last timing information and the number of frames that have been processed 
since receipt of the last timing information. 

[01 1 6] During fast forward and fast rewind, the identity of frames cannot be determined by the timing information con- 
tained in the MPEG data stream. For example, the third frame after a particular time stamp may be one of any number 
40 of frames depending on the current playback rate and frame selection technique. Consequently, to provide frame accu- 
rate positioning, the stream server 110 is configured to insert a time stamp in front of every frame transmitted in the 
MPEG data stream. Video pump 130 receives the time stamp information from the stream server 110, which retrieves 
the time stamp from the tag file 1 06. 

[0117] Many clients are not able to decode more than a certain number of time stamps per second because the 
45 MPEG specification does not require them to decode more than a certain amount of time stamps per second. There- 
fore, in the preferred embodiment, the time stamp inserted before each frame is not an MPEG time stamp. Rather, the 
time stamps are placed in packets that are tagged as MPEG "private data packets". When a client receives a private 
data packet it determines whether it recognizes the data in the packet Clients that do not support private data time 
stamps simply discard the private data packets containing the time stamps and thus will not be able to do perfect frame 
so accurate positioning. Such clients will still be able to perform approximate frame positioning based on the MPEG time 
stamps that are coinddentally included in the MPEG data stream. Clients that support private data time stamps extract 
the time stamps from the private data packets and thus can exactly determine the identity of the frames that follow the 
time stamps. 

55 XVII. DISK ACCESS CONSTRAINTS 

[0118] In some video playback systems, a single MPEG file may be stored across numerous disk drives, to increase 
the fault tolerance of the system. Consider, for example, the multi-disk system 700 illustrated ia Figure 7. System 700 



16 

BNSDOCID: <EP 0964578A2_L> 



EP 0 964 578 A2 



includes N+1 disk drives. An MPEG file is stored on N of the N+1 disks. The MPEG file is divided into sections 750, 752, 
754 and 756. Each section is divided into N blocks, where N is the number of disks that will be used to store the MPEG 
file. Each disk stores one block from a given section. 

[0119] In the illustrated example, the first section 750 of the MPEG file includes blocks 710, 712 and 714 stored on 
5 disks 702, 704 and 706, respectively. The second section 752 includes blocks 716, 718 and 720 stored on disks 702, 
704 and 706, respectively. The third section 754 includes blocks 722, 724 and 726 stored on disks 702, 704 and 706, 
respectively. The fourth section 756 includes blocks 728, 730 and 732 stored on disks 702, 704 and 706, respectively. 
[0120] The disk 708 which is not used to store the MPEG file is used to store check bits. Each set of check bits cor- 
responds to a section of the MPEG file and is constructed based on the various blocks that belong to the corresponding 
io section. For example, check bits 734 corresponds to section 750 and is generated by performing an exclusive OR oper- 
ation on all of the blocks in the first section 750. Similarly, check bits 736. 738 and 740 are the products of an exclusive 
OR performed on all of the blocks in the section 752, 754 and 756, respectively. 

[0121] System 700 has a higher fault tolerance than a single disk system in that if any disk in the system ceases to 
operate correctly, the contents of the bad disk can be reconstructed based on the contents of the remaining disks. For 
15 example, if disk 704 ceases to function, the contents of block 712 can be reconstructed based on the remaining blocks 
in section 750 and the check bits 734 associated with section 750. Similarly, block 71 8 can be constructed based on the 
remaining blocks in section 752 and the check bits 736 associated with section 752. This error detection and correction 
technique is generally known as " Redundant Array of Inexpensive Disks** or RAID. 

[01 22] During real-time playback using RAID, a video pump reads and processes the MPEG file on a section by sec- 
20 Won basis so that all of the information is available to reconstruct any faulty data read from disk. During normal playback 
operations, there is sufficient time to perform the disk accesses required to read an entire section while the data from 
the previous section is being transmitted in the MPEG data stream. However, during fast forward and fast rewind oper- 
ations, less than all of the data in any section will be sent in the MPEG data stream. Because less data is sent the trans- 
mission of the data will take less time. Consequently, less time will be available to read and process the subsequent 
25 section. 

[0123] For example, assume that only one frame X from section 750 was selected for display during a fast forward 
operation. During the time it takes to transmit the segment for frame X. the data for the next selected frame Y must be 
read and processed. Assume that the next frame Y is located in section 752. tf the MPEG file is read and processed on 
a section by section basis (required for RAID), then all of the blocks in section 752 must be read and processed during 
30 the transmission of the single frame X. Even if it were possible to read and process all of the blocks in section 752 in 
the allotted time, it may still be undesirable to do so because of the resources that would be consumed in performing 
the requisite disk accesses. 

[0124] In light of the foregoing, video pump 130 does not use RAID during fast forward and fast rewind operations. 
Rather, video pump 130 reads, processes and transmits only the data indicated in the commands it receives from the 

35 stream server 110. Thus, in the example given above, only the frame data for frame Y would be read and processed 
during the transmission of the segment for frame X. By bypassing RAID during fast forward and fast rewind operations, 
disk bandwidth remains at the same level or below that used during normal playback operations. 
[0125] Since RAID is not used during real-time fast forward and fast rewind operations, faulty data cannot be recon- 
structed during these operations. Consequently, when the video pump 130 detects that the data for a selected frame is 

40 corrupted or unavailable, the video pump 130 discards the entire segment associated with the problem frame. Thus, if 
the data associated with a frame cannot be sent, then the prefix and suffix data for the frame is not sent either. However, 
any padding packets that were to be sent along with the prefix or suffix data will still be sent 

[0126] By sencfing data in entire "segments", conformance with the digital audio- visual format is maintained. In one 
embodiment, the video pump 130 will send down padding packets to fill the line to maintain the correct presentation 
45 rate. In the preferred embodiment, this behavior is selectable by the client. 

XVIII. VARIABLE RATE PLAYBACK OPERATIONS 

[0127] As mentioned above, a client may change the presentation rate of the audio-visual work by transmitting a rate 
so change request to the stream server 110. Typically, clients issue change rate requests in response to input received 
from a user. For example, a user may press a fast forward button on a remote control. The remote control transmits a 
signal that identifies the button that was pressed. The client receives and decodes the signal transmitted by the remote 
control to determine that the fast forward button was requested. The client then transmits a change rate request to the 
stream server 1 10 that specifies some presentation rate greater than 1x. 
55 [0128] According to one embodiment of the invention, the client is configured to detect if the user continues to hold 
down the fast forward button. If the user holds down the fast forward button for more than a predetermined interval, then 
the client transmits a second change rate request that designates a faster presentation rate than the previously 
requested presentation rate. While the user continues to hold down the fast forward button, the presentation rate is con- 



17 



BNSDOCID: <EP 0964578A2_L> 



EP 0 964 578 A2 



tinuously increased. Another button, such as the rewind button, may be pressed to incrementally decrease the presen- 
tation rate. 

[0129] The process described above appears to the user as a variable rate fast forward operation. However, to the 
stream server 1 10, the operation actually consists of a series of distinct fast forward operations. This incremental rate 
5 adjustment process has been described with reference to fast forward operations. However, it may equally be applied 
to slow forward, slow rewind and fast rewind operations. Further, rate changes may be performed in response to the 
how many times a particular button is pressed rather than or in addition to how long the button is pressed. In addition, 
a visual indication of the current presentation rate, such as an arrow that has a length that reflects the presentation rate, 
may be displayed on the screen while the presentation rate does not equal 1x. 

10 

XIX. NON-INTERACTIVE DIGITAL AUDIO-VISUAL EDITING 

[01 30] By initiating seek operations and rate-specified playback operations, a user is effectively performing interactive 
MPEG editing. That is, the MPEG data stream that is produced in response to these operations is based on but differs 
15 from the content of the original MPEG file. In addition to such interactive presentation of content, the present invention 
provides a mechanism for non-interactive MPEG editing. During non-interactive MPEG editing, an MPEG file is pro- 
duced which is based on but differs from one or more pre-existing MPEG files. The mechanism for non-interactive 
MPEG editing shall now be described with reference to Figures 5 and 6. 

[0131] Referring to Figure 5, an MPEG editor 502 is provided for generating new MPEG sequences based on pre- 

20 existing MPEG content. According to one embodiment, the MPEG editor 502 reads a command file 504 containing edit- 
ing commands. The commands contained in the command file 504 include parameters for specifying "splices" from pre- 
existing MPEG files. For example, each of the commands in command file 504 may have the following format: 

"filename^Istart^posJtendjDOsllpresentation^ate] 
[0132] In this exemplary command, the "filename" parameter represents a pre-existing MPEG file. The remaining 

25 parameters specify a splice from the specified MPEG file. Specifically, the start_pos parameter represents the position 
within the specified MPEG file at which to begin the splice. If no start _pos is designated, it may be assumed that the 
splice is to begin at the first frame of the specified MPEG file. The end_pos parameter represents the position at which 
to end the splice. If no end_pos is designated, it may be assumed that the splice is to end at the end of the specified 
MPEG file. The presentation_rate represents the presentation rate of the splice relative to the original MPEG f fle. If no 

30 presentation rate is specified, then a normal (i.e. 1x) presentation rate is assumed. 

[0133] In the preferred embodiment, the start_pos and end_pos parameters are specified in terms of time because 
timing information is typically more accessible to a user than file position information. For example, a user may want to 
specify a two minute splice that begins ten minutes into a particular MPEG movie and ends twelve minutes into the 
MPEG movie. The user typically will not know the file position of the first byte in the frame that is displayed ten minutes 

35 into the movie, or the last byte in the frame that is displayed twelve minutes into the movie. As shall be explained here- 
after, the MPEG editor 502 determines file positions that correspond to the specified times by inspecting the tag infor- 
mation for the specified MPEG file. 

[01 34] The operation of MPEG editor 502 shall now be described with reference to Figure 6. At step 600, the MPEG 
editor 502 reads a command in the command file 504. Preferably the commands are read in the same sequence as they 
40 appear in the command file 504. Therefore, MPEG editor 502 will read the first command in command file 504 the first 
time that step 600 is performed. 

[01 35] At step 602, the MPEG editor 502 determines whether the command specified a 1 x presentation rate. If a pres- 
entation rate other than 1x was specified, then control passes to step 604. Steps 604 and 606 are analogous to the 
steps performed by stream server 110 and video pump 130 during a specified-rate playback operation. Specifically, at 

45 step 604 MPEG editor 502 selects frames in the specified MPEG file that fail within the specified time period (start _pos 
to end_pos). Frames are selected based on the specified presentation rate and the tag information according to the 
selection process described in detail above. Once the frames are selected, segments are generated (step 606) which 
package the frame data corresponding to the selected frames in MPEG-compliant packets. These segments are stored 
in sequence to produce a portion of an edited MPEG file 510. Control then passes to step 612, which either causes the 

so next command to be processed or the editing operation to end if there are no more commands to be processed. 

[0136] If a 1x presentation rate was specified, then control passes from step 602 to step 614. At steps 614 and 616, 
MPEG editor 502 performs an operation analogous to the seek operation described above. Specifically, MPEG editor 
502 compares the specified starting position with the time stamp information contained in the tag file 1 06 to determine 
the position of a target frame. MPEG editor 502 then generates prefix data (step 614) to perform the transition to the 

55 specified frame. After generating the prefix data, MPEG editor 502 copies data from the specified MPEG file into the 
edited MPEG file 510 beginning at the start of the target frame (step 616). 

[01 37] Once the data between start_pos and end_pos has been copied into edited MPEG file 51 0, MPEG editor 502 
determines whether the splice terminated at the end of the specified MPEG file (step 610). If the splice terminated at 
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the end of the specified MPEG file, then the splice ended on a packet boundary. Otherwise, suffix data is generated 
(step 618) to complete the current packet (step 618). Control then passes to step 612, which either causes the next 
command to be processed or the editing operation to end if there are no more commands to be processed. 
[0138] When all of the commands in the command file 504 have been processed by MPEG editor 502, the edited 
s MPEG file 510 will be an MPEG compliant file containing the splices specified by the commands in the command file 
504. Significantly, the edited MPEG file 510 was generated without having to perform additional analog-to-MPEG 
encoding. Further, editing may be performed even if one does not have access to any of the analog versions of the orig- 
inal works. By generating MPEG files in this manner, a user may quickly create unique and original movies based on 
preexisting MPEG content. 

io [01 39] Typically, non-interactive MPEG editing does not have to be performed in real-time. Therefore, some of the time 
constraints that apply to real-time operations do not apply to non-interactive MPEG editing. For example, it was 
explained above that due to timing constraints RAID error correction techniques are not used during fast forward and 
fast rewind operation. Since such timing constraints do not apply to non-interactive MPEG editing, RAID is used during 
the fast forward and fast rewind operations performed to produce edited MPEG file 510. 

is [0140] For the purpose of explanation, the various data repositories used in the editing process are illustrated as files 
stored on storage device 140. However, this form and location of this data may vary from implementation to implemen- 
tation. For example, the various files may be stored on separate storage devices. Further, a user interface may be pro- 
vided which allows a user to operate graphical controls to specify the parameters for a series of splices. 

20 XX DISTRIBUTED SYSTEM 

[0141 ] As explained above, the tasks performed during the real-time transmission of MPEG data streams are distrib- 
uted between the stream server 1 10 and the video pump 130. The distributed nature of this architecture is enhanced 
by the fact that the video pump 130 does not require access to tag file 106, and stream server 110 does not require 
25 access to MPEG file 104. Consequently, stream server 110 and video pump 130 may operate in different parts of the 
network without adversely affecting the efficiency of the system 100. 

[0142] In the foregoing specification, the invention has been described with reference to specific embodiments 
thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from 
the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illus- 
30 trative rather than a restrictive sense. 

Claims 

1 . A method for creating a second digital video stream from one or more other digital video streams, the method com- 
35 prising the computer-implemented steps of: 

receiving a series of editing commands, each editing command in said series of editing commands specifying 
a start position, an end position, and a presentation rate; 

for each editing command in said series of editing commands, performing the steps of 
40 selecting a selected set of video frames between said start position and said end position in said one or more 

other digital video streams based on said presentation rate; and 

storing frame data corresponding to said selected set of video frames in said second digital video stream. 

2. The method of Claim 1 wherein: 

45 

said one or more other digital video streams includes a plurality of other digital video streams; 
each editing command in said series of editing commands specifies one of said other digital video streams; 
said step of selecting said selected set of video frames includes selecting video frames that are represented 
by data that is located between said start position and said end position in the digital video stream specified in 
so said editing command. 

3. The method of Claim 1 wherein: 

said start position indicates a first amount of elapsed time; and 
55 said end position indicates a second amount of elapsed time. 

4. The method of Claim 1 wherein the step of selecting said selected set of video frames between said start position 
and said end position in said one or more other digital video streams based on said presentation rate comprises 



19 



BNSDOCID: <EP 0964578A2J_> 



EP0 964 578 A2 



the steps of: 

determining a bit budget based on a first time value associated with a most recently selected video frame, a 
second time value associated with a current frame, said presentation rate and a data transfer rate; 
5 determining a size of the frame data that corresponds to the current frame; 

if the size of the frame data that corresponds to the current frame exceeds said bit budget, then 
not selecting said current frame as a video frame in said selected set of video frames, and 
selecting a new frame as a new current frame; and 

if the size of the frame data that corresponds to the current frame does not exceed said bit budget, then 
10 selecting said current frame as a video frame in said selected set of video frames. 

5. The method of Claim 1 further comprising storing data between said frame data corresponding to said selected set 
of video frames to cause said second digital video stream to conform to a predetermined format. 

15 6. The method of Claim 5 wherein: 

said predetermined format is MPEG-2; and 

said step of storing data between said frame data includes storing data to serve as a valid transport packet 
header and PES packet header. 

20 

7. The method of Claim 5 wherein: 

said predetermined format is MPEG-1 ; and 

said step of storing data between said frame data includes storing data to serve as a valid pack header and 
25 system header. 

8. The method of Claim 5 further comprising the step of removing one or more non-video packets from frame data 
corresponding to a frame of said selected set of video frames prior to storing said frame data in said second digital 
video stream. 

30 
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